Skip to main content
Version: v1.4.1

Classification - Tool Failure Prediction

Predicting machine failure using operational sensor data and machine characteristics.

Dataset Source: Industrial Equipment Monitoring Dataset Problem Type: Binary Classification Target Variable: Machine failure (0 = Normal operation, 1 = Failure) Use Case: Predictive maintenance for industrial equipment to prevent unexpected downtime and optimize maintenance schedules

Package Imports

1!pip install xplainable
2!pip install xplainable-client
1import pandas as pd
2import xplainable as xp
3from xplainable.core.models import XClassifier
4from xplainable.core.optimisation.bayesian import XParamOptimiser
5from xplainable_preprocessing import PipelineSpec, StepSpec, compile_spec
6from sklearn.model_selection import train_test_split
7import requests
8import json
9
10from xplainable_client.client.client import XplainableClient
11from xplainable_client.client.base import XplainableAPIError

Xplainable Cloud Setup

1# Initialize Xplainable Cloud client
2client = XplainableClient(
3 api_key="", #Create api key in xplainable cloud - https://platform.xplainable.io/
4 hostname="https://platform.xplainable.io"
5)

Data Loading and Exploration

1# Load dataset
2df = pd.read_csv("https://xplainable-public-storage.syd1.digitaloceanspaces.com/example_data/asset_failure.csv")
3
4# Display basic information
5print(f"Dataset shape: {df.shape}")
6print(f"Target distribution:
7{df['Machine failure'].value_counts()}")
Out:

Dataset shape: (10000, 14)

Target distribution:

Machine failure

0 9661

1 339

Name: count, dtype: int64

1df.head()
UDIProduct IDTypeAir temperature [K]Process temperature [K]Rotational speed [rpm]Torque [Nm]Tool wear [min]Machine failureTWFHDFPWFOSFRNF
01M14860M298.1308.6155142.80000000
12L47181L298.2308.7140846.33000000
23L47182L298.1308.5149849.45000000
34L47183L298.2308.6143339.57000000
45L47184L298.2308.71408409000000

Dataset Overview: Machine Failure Prediction

This dataset is designed for predictive maintenance, focusing on machine failure prediction. Below is an overview of its structure and the data it contains:

  1. UDI (Unique Identifier): A column for unique identification numbers for each record.

  2. Product ID: Identifier for the product being produced or involved in the process.

  3. Type: Indicates the type or category of the product or process, with different types represented by different letters (e.g., 'M', 'L').

  4. Air temperature [K] (Kelvin): The temperature of the air in the environment where the machine operates, measured in Kelvin.

  5. Process temperature [K] (Kelvin): The operational temperature of the process or machine, also measured in Kelvin.

  6. Rotational speed [rpm] (Revolutions per Minute): This column shows the speed at which a component of the machine is rotating.

  7. Torque [Nm] (Newton Meters): The torque being applied in the process, measured in Newton meters.

  8. Tool wear [min]: Indicates the amount of wear on the tools used in the machine, measured in minutes of operation.

  9. Machine failure: A binary indicator (0 or 1) showing whether a machine failure occurred.

  10. TWF (Tool Wear Failure): Specific indicator of failure due to tool wear.

  11. HDF (Heat Dissipation Failure): Indicates failure due to ineffective heat dissipation.

  12. PWF (Power Failure): Shows whether a failure was due to power issues.

  13. OSF (Overstrain Failure): Indicates if the failure was due to overstraining of the machine components.

  14. RNF (Random Failure): A column for failures that don't fit into the other specified categories and are considered random.

Each row of the dataset represents a unique instance or record of the production process, with the corresponding measurements and failure indicators. This data can be used to train machine learning models to predict machine failures based on these parameters.

1# Define the preprocessing spec
2preprocessing_spec = PipelineSpec(steps=[
3 StepSpec(
4 id="drop_columns",
5 type="DropColumnsTransformer",
6 params={"columns": [
7 "Product ID", # Identifier - not predictive
8 "UDI", # Unique identifier - not predictive
9 "TWF", # Failure sub-type - data leakage
10 "HDF", # Failure sub-type - data leakage
11 "PWF", # Failure sub-type - data leakage
12 "OSF", # Failure sub-type - data leakage
13 "RNF", # Failure sub-type - data leakage
14 ]},
15 description="Drop identifier columns and failure sub-type columns (data leakage)"
16 ),
17])
18
19# Compile and apply the pipeline
20pipeline = compile_spec(preprocessing_spec)
21df = pipeline.fit_transform(df)
22df
1try:
2 preprocessor_id, preprocessor_version_id = client.preprocessing.create_preprocessor(
3 name="Tool Failure Preprocessing",
4 description="Drop identifier and failure sub-type columns from the asset failure dataset to prevent data leakage",
5 spec=preprocessing_spec.model_dump(),
6 sample_df=df,
7 )
8 print(f"Preprocessor created: {preprocessor_id} (version: {preprocessor_version_id})")
9except (XplainableAPIError, ValueError) as e:
10 print(f"Error creating preprocessor: {e}")
11 preprocessor_id, preprocessor_version_id = None, None
1df["Machine failure"].value_counts()
Out:

Machine failure

0 9661

1 339

Name: count, dtype: int64

1X, y = df.drop(columns=['Machine failure']), df['Machine failure']
2
3X_train, X_test, y_train, y_test = train_test_split(
4 X, y, test_size=0.33, random_state=42)

1. Data Preprocessing

1opt = XParamOptimiser()
2params = opt.optimise(X_train, y_train)
Out:

100%|██████████| 30/30 [00:02<00:00, 14.47trial/s, best loss: -0.9315957452699021]

2. Model Optimization

3. Model Training

1model = XClassifier(**params)
2model.fit(X_train, y_train)
Out:

<xplainable.core.ml.classification.XClassifier at 0x2a0d61de0>

4. Model Interpretability and Explainability

1model.explain()
1params = {
2 "max_depth": 4,
3 "min_info_gain": 0.05,
4}
5
6model.update_feature_params(features=['Rotational speed [rpm]', 'Tool wear [min]', 'Air temperature [K]', 'Process temperature [K]','Torque [Nm]'], **params)
Out:

<xplainable.core.ml.classification.XClassifier at 0x2a0d61de0>

1model.explain()

In this snapshot, we demonstrate the impact of hyperparameter tuning on model interpretability. By adjusting max_depth and min_info_gain, we refine the feature wise explainability and information criterion, respectively, which in turn recalibrates feature score contributions. These scores, essential in understanding feature contributions to model predictions, are visualized before and after parameter adjustment, illustrating the model's internal logic shifts. This process is critical for enhancing transparency and aids in pinpointing influential features, fostering the development of interpretable and trustworthy machine learning models.

5. Model Persistence

1# Create a model
2try:
3 model_id, version_id = client.models.create_model(
4 model=model,
5 model_name="Asset Failure Prediction",
6 model_description="Using machine metadata to predict asset failures",
7 x=X_train,
8 y=y_train
9 )
10except XplainableAPIError as e:
11 print(f"Error creating model: {e}")
Out:

0%| | 0/6 [00:00<?, ?it/s]

6. Model Deployment

The code block illustrates the deployment of our prediction model using the client.deployments.deploy function. The deployment process involves specifying the unique model_version_id that we obtained in the previous steps. This step effectively activates the model's endpoint, allowing it to receive and process prediction requests. The deployment response confirms the successful deployment with a deployment_id and other relevant information.

1try:
2 deployment_response = client.deployments.deploy(
3 model_version_id=version_id #<- Use version id produced above
4 )
5 deployment_id = deployment_response.deployment_id
6except XplainableAPIError as e:
7 print(f"Error deploying model: {e}")

Testing the Deployment programatically

This section demonstrates the steps taken to programmatically test a deployed model. These steps are essential for validating that the model's deployment is functional and ready to process incoming prediction requests.

  1. Activating the Deployment: The model deployment is activated using client.deployments.activate_deployment, which changes the deployment status to active, allowing it to accept prediction requests.
1try:
2 client.deployments.activate_deployment(deployment_id=deployment_id)
3except XplainableAPIError as e:
4 print(f"Error activating deployment: {e}")
  1. Creating a Deployment Key: A deployment key is generated with client.deployments.generate_deploy_key. This key is required to authenticate and make secure requests to the deployed model.
1try:
2 deploy_key = client.deployments.generate_deploy_key(
3 deployment_id=deployment_id,
4 description='API key for Tool Failure Prediction',
5 days_until_expiry=7
6 )
7 print(f"Deploy key created: {str(deploy_key)}")
8except XplainableAPIError as e:
9 print(f"Error generating deploy key: {e}")
Out:

Deploy key created: 76a66348-5af7-471e-9b4a-b18233ce4325

  1. Generating Example Payload: An example payload for a deployment request is generated by client.deployments.generate_example_deployment_payload. This payload mimics the input data structure the model expects when making predictions.
1#Set the option to highlight multiple ways of creating data
2option = 2
1if option == 1:
2 try:
3 body = client.deployments.generate_example_deployment_payload(
4 model_version_id=version_id
5 )
6 except XplainableAPIError as e:
7 print(f"Error generating example payload: {e}")
8 body = []
9else:
10 body = json.loads(df.drop(columns=["Machine failure"]).sample(1).to_json(orient="records"))
1body
Out:

[&#123;'Type': 'L',

'Air temperature [K]': 300.8,

'Process temperature [K]': 312.0,

'Rotational speed [rpm]': 1374,

'Torque [Nm]': 50.2,

'Tool wear [min]': 154&#125;]

  1. Making a Prediction Request: A POST request is made to the model's prediction endpoint with the example payload. The model processes the input data and returns a prediction response, which includes the predicted class (e.g., 0 for no failure) and the prediction probabilities for each class.
1response = requests.post(
2 url="https://inference.xplainable.io/v1/predict",
3 headers={'api_key': str(deploy_key)},
4 json=body
5)
6
7value = response.json()
8value
Out:

[&#123;'index': 0,

'id': None,

'partition': '__dataset__',

'score': 0.2271685523961836,

'proba': 0.06946049314245507,

'pred': 0,

'support': 331,

'breakdown': [&#123;'feature': 'base_value',

'value': None,

'score': 0.035522388059701496&#125;,

&#123;'feature': 'Type', 'value': 'L', 'score': 0.024408834293742288&#125;,

&#123;'feature': 'Air temperature [K]', 'value': '300.8', 'score': 0.0&#125;,

&#123;'feature': 'Process temperature [K]', 'value': '312', 'score': 0.0&#125;,

&#123;'feature': 'Rotational speed [rpm]',

'value': '1374',

'score': 0.18484190508536333&#125;,

&#123;'feature': 'Torque [Nm]', 'value': '50.2', 'score': -0.011265812711409608&#125;,

&#123;'feature': 'Tool wear [min]',

'value': '154',

'score': -0.0063387623312138736&#125;]&#125;]